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Introduction to Digital Signal Processing 


Information, Signals and Systems 


Signal processing concerns primarily with signals and systems that operate 
on signals to extract useful information. In this course our concept of a 
“signal” will be very broad, encompassing virtually any data that can be 
represented as an organized “collection” of data. 


Example: 


e A continuous function f(t) 

e A sequence of discrete data points f|n| 

e A multi-dimensional array of data 

e Audio, images, video, voltage of antenna 

e Stock prices, potassium concentration in a neuron 


Our concept of a “system” will be a black box that takes a signal as input 
and provides another signal as output. 


Example: 


e Analog-to-digital converters (ADCs) 
e Filters 

e Decimators/Interpolators 

Matched filters 

e Face recognition systems 


In this course we will approach signal processing from the point of view 
that signals are vectors living in an appropriate vector space, and systems 
are operators that map signal from one vector space to another. This allows 
us to use a common mathematical framework to talk about how to: 


e represent signals 

e measure similarity/distance between signals 

¢ transform signals from one representation to another 

¢ understand the operation of linear systems on the signals 


Since the ficus of this course in on digital signal processing, this will also 
allow us to use tools from linear algebra to facilitate this understanding. 


Digital Signal Processing 


DSP is often presented as an alternative to analog signal processing, i.e., 
instead of a purely analog system as in [link], we can build a digital 
implementation of an analog system as in [link]. This can be advantageous 
since high-precision analog components are expensive (even compared to 
the cost of an ADC/DAC). 


f)—} 1 Lal 


An analog system. 


A digital implementation of an analog system. 


However, the success of DSP derives to a much greater extent from the 
facts that: 


1. Discrete-valued signals can be more robust to noise, as illustrated in 
[link]. In [link](a), noise may be impossible to eliminate, but in [link] 
(b) noise can be eliminated entirely by exploiting the discrete structure 
of the signal. 

2. Once we have a digital, discrete-time signal, we can store it in memory 
and perform highly complex processing. 


(a) An analog signal corrupted with noise; (b) A discrete-valued signal 
corrupted with noise. 


In this course we will consider signal processing systems beyond simple 
LTI filters. Themes of the course include: 


e Signals as vectors, vector space geometry 

e Signal representations and bases 

e Linear systems analysis and linear algebra 

e “Optimality” in signal processing (e.g., optimal filter design) 


Metric Spaces 


We will view signals as elements of certain mathematical spaces. The 
spaces have a common structure, so it will be useful to think of them in the 
abstract. 


Metric Spaces 


Definition 1 
A set is a (possibly infinite) collection of distinct objects. 


Example: 


e The empty set: @ = {} (plays a role akin to zero) 

e Binary numbers: {0, 1} 

e Natural numbers: N = {1,2,3,...} 

e Integers: Z = {..., -2,—1,0,1,2,...}(Z is short for “Zahlen”, 
German for “numbers”) 

e Rational numbers: Q (Q for “quotient’) 

e Real numbers: R 

e¢ Complex numbers: C 


In this course we will assume familiarity with a number of common set 
operations. In particular, for the sets A = {0,1}, B = {1}, C = {2}, we 
have the operations of: 


¢ UnionA U B = {0,1}, BUC = {1,2} 

¢ IntersectionA 1 B = {1}, BNC=0 

e Exclusion A \ B = {0} 

¢ ComplementA‘ = U \ A, A° = {2} 

¢ Cartesian Product A? = A x A = {(0,0), (0,1), (1,0), (1, 1)} 


In order to be useful a set must typically satisfy some additional structure. 
We begin by defining a notion of distance. 


Definition 2 
A metric space is a set M together with a metric (distance function) 
d: M x M — Rsuch that for all x,y,z € M 


e Mi.d(x, y) = d(y, x) (symmetry) 
e M2.d(x, y) > 0 (non-negative) 
e M3.d(z, y) = O iff x = y (positive semi-definite) 
e M4.d(x, z) < d(x, y) + d(y, z) (triangle inequality). 
Example: 
e Trivial metric(// is arbitrary) d(x, y) = 
= a 


e Standard metric(M = R) d(z, y) = |x — y| 

¢ Euclidean metric(M = RY) d = {ee Gi ysl 
¢ metric(M = R”)d(z,y) = So | 

+ metric, (M=R™)d (2,9) = (SM lav—uil?) 


¢ metric(M = R”) d(a,y) =max;-1,..,w |@i — yi 
e metric(J/ = real (or complex) valued functions defined on |a, ]) 


dy (2,4) = (fle (t) —y(e)|Pat) 


Completeness 


Distance functions allow us to talk concretely about limits and convergence 
of sequences. 


Definition 1 

Let (M, d(x, y)) be a metric space and {x;};", be a sequence of elements 
in M. We say that {x;}°", converges to a if and only if for every e > 0 
there is an N such that d ice a”) < € forallz > N. In this case we say that 
ax is the limit of {x;}°°.. 


e 
e * ti v1 
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A sequence of points {;} 
converging to x. 


Definition 2 
A sequence {x;};*, is said to be a Cauchy sequence if for any € > 0 there 
is an N such that d(a;,x;) < e for everyi,j > N. 


It can be shown that any convergent sequence is a Cauchy sequence. 
However, it is possible for a Cauchy sequence to not be convergent! 


Example: 

Suppose that M = (0, 2), i.e., the open interval from 0 to 2 on the real 
line, and let d(x, y) = |x — y|. Consider the sequence defined by x; = = 
{a;} is Cauchy since for any € we can set N such that =- < &, so that 
toe Sel —e blowevel 2, ODIO. 4, er 

the sequence converges to something that lives outside of our space. 


Example: 

Suppose that MZ = C|—1, 1] (the set of continuous functions defined on 
[—1, 1]) and let dz denote the Lz metric. Consider the sequence of 
functions defined by 


Equation: 
0 if t<-4 
pO > S45 Bales 
1 if t> 4. 


For 7 > 2 we have that 
Equation: 


7 -\2 
do (fi, f;) = gay 
2\J00 59) — 6731 
This goes to 0 for j,2 sufficiently large. Thus, the sequence { f;};~, is 


Cauchy, but it converges to a discontinuous function, and thus it is not 
convergent in M. 


Definition 3 


A metric space (M, d(x, y)) is complete if every Cauchy sequence in M is 
convergent in M. 


Example: 
e M = (0,1), d(x, y) = |x — y| is complete. 
e (C |—1, 1], dz) is not complete, but one can check that 


(C' [-1, 1], d.) is complete. (This space works because using do, the 
above example is no longer Cauchy.) 
e Q is not complete, but R is. 


Vector Spaces 


Metric spaces impose no requirements on the structure of the set (7. We 
will now consider more structured M, beginning by generalizing the 
familiar concept of a vector. 


Definition 1 
Let K bea field of scalars, i.e., K = Ror C. Let V be a set of vectors 
equipped with two binary operations: 


1. vector addition: + :V x VV 
2. scalar multiplication: -: kK x V > V 


We say that V is a vector space (or linear space) over K if 

e VS1:V forms a group under addition, i.e., 
(e+y)+z=2+ (y+ z) (associativity) 
xt+y=y+2z (commutativity) 


40¢€VsuchthatV x €V,2+0=04+2=2 
Ve € V,d ysuch that c + y = 0 


O- 0: OL 


e VS2:For anya, € K andz,ye€ V 


© a(x) = (af)x (compatibility) 
o (a+ P)(x+y)=ar+ay+ bx + Gy (distributivity) 
o 41 € K such that lz = x 


Example: 


¢ IR over R (not R® over C) 

e C% over C or CX over R 

e Set of polynomials of degree N with rational coefficients over Q 

e The set of all infinitely-long sequences of real numbers over R 
GF(2)™ : {0,1}” over {0, 1} with mod 2 arithmetic (Galois field) 
e Cla, b] over R 


Normed Vector Spaces 


While vector spaces have additional structure compared to a metric space, a 
general vector space has no notion of “length” or “distance.” 


Definition 1 
Let V be a vector space over K. A norm is a function ||-|| : V — R such 
that 


N1. ||z|| > OVae eV 

N2.||z|| = 0iffe =0 

N3.|lax|| = |a|||a||V2 Ee ViaEe Kk 
NA. ||z + yl < lll + |lylIVe,yeV 


A vector space together with a norm is called a normed vector space (or 
normed linear space). 


Example: 


N 2 
°V=R"*: zl, = ye a 
R2 


N 
°V=R*: eal = ea Ez 


(“Taxicab”/“Manhattan” norm) 


llall, 


-V=R*: ely See he eee 
R2 


+ V =L,[a,b),p € (1,00): a(@)||, = (J? le (Pat) |” The 


notation L,, [a, 6] denotes the set of all functions defined on the 
interval [a, b| such that this norm exists, i-e., || x (¢) ||, < 00.) 


Note that any normed vector space is a metric space with induced metric 
d(x, y) = ||x — y||. (This follows since 

lz — y|| = lla —z+2-— yl € |le — z|] + |ly — z||.) While a normed 
vector space “feels like” a metric space, it is important to remember that it 
actually satisfies a great deal of additional structure. 


Technical Note: In a normed vector space we must have (from N2) that 
x = yif ||x — y|| = 0. This can lead to a curious phenomenon when 
dealing with continuous-time functions. For example, in Lz ({a, b]), we can 


consider a pair of functions like x(t) and y(t) illustrated below. These 
functions differ only at a single point, and thus || z(t) — y(t) ||, = 0 
(since a single point cannot contribute anything to the value of the integral.) 
Thus, in order for our norm to be consistent with the axioms of a norm, we 
must say that = y whenever x(t) and y(t) differ only on a set of measure 
zero. To reiterate x = y x(t) = y(t) Vt © |a, OI, i.e., when we treat 
functions as vectors, we will not interpret x = y as pointwise equality, but 


rather as equality almost everywhere. 
a(t) 


Inner Product Spaces 


Where normed vector spaces incorporate the concept of length into a vector 
Space, inner product spaces incorporate the concept of angle. 


Definition 1 
Let V be a vector space over K. An inner product is a function 
(-,-): V x V > K such that for all z,y,z € Viae K 


e IPI. (x,y) = (y, 2) 
e IP2. (ax, y) = a(x, y) 
¢ IP3. (x + y, 2) = (x, 2) + (y, 2) 

e IP4.(xz, x) > 0 with equality iff x = 0. 


A vector space together with an inner product is called an inner product 
space. 


Example: 


Note that a valid inner product space induces a normed vector space with 
norm ||x|| = +/ (a, x). (Proof relies on Cauchy-Schwartz inequality.) In 
RY or CX, the standard inner product induces the £2-norm. We summarize 
the relationships between the various spaces introduced over the last few 
lectures in [link]. 


normed vector spaces 


inner product spaces 


Venn diagram illustrating 
the relationship between 
vector and metric spaces. 


Properties of Inner Products 
Inner products and their induced norms have some very useful properties: 


e Cauchy-Schwartz Inequality: |(x, y)| < ||z||||y|| with equality iff 
4 a € C such that y = ax 
e Pythagorean Theorem: 
2 2 2 2 
(#9) = 0 = |e + yl! = fle — all = Mell” + lly , 
¢ Parallelogram Law: ||x + y||" + |x — yl|" = 2\[2||" + 2|ly]| 


2 2 
¢ Polarization Identity: Re [(z, y)] = deta eeal 


In R? and R°, we are very familiar with the geometric notion of an angle 
between two vectors. For example, if x, y € IR?, then from the law of 
cosines, (x, y) = ||z||||y|| cos 6. This relationship depends only on norms 


and inner products, so it can easily be extended to any inner product space. 
y 


Definition 1 
The angle@ between two vectors x, y in an inner product space is defined by 
(x,y) 


cos § = 
altel 


Definition 2 
Vectors x, y in an inner product space are said to be orthogonal if 


(x,y) = 0. 


Complete Vector Spaces 


Definition1 
A complete normed vector space is called a Banach space. 


Example: 


f\l,,. = ess sup |f (t)| is a Banach 


tE|a,b] 


e -Cla, b] with L.. nom, i.e., 


space. 

¢ Ly [a,b] = {f :|| f ||, < co} for p € [1, oo] and -00 <a <b< co 
is a Banach space. 

¢ £,(N) = {sequences z:| z ||, < oo} for p € [1, co] is a Banach 
space. 

e Any finite-dimensional normed vector space is Banach, e.g., RY or 
C¥ with any norm. 

¢ Cla, b] with L, norm for p < oo is not Banach. 


Definition 2 
A complete inner product space is called a Hilbert space. 


Example: 


e DL» [a, bj is a Hilbert space. 
e £5(N) is a Hilbert space. 
e Any finite-dimensional inner product space is a Hilbert space. 


Note that every Hilbert space is Banach, but the converse is not true. Hilbert 
spaces will be extremely important in this course. 


Hilbert Spaces in Signal Processing 


What makes Hilbert spaces so useful in signal processing? In modern signal 
processing, we often represent a signal as a point in high-dimensional 
space. Hilbert spaces are spaces in which our geometry intuition from R? is 
most trustworthy. As an example, we will consider the approximation 
problem. 


Definition 1. 

A subset W of a vector space V is convex if for all x, y © W and 
A € (0,1),Aze+(1—A)y € W. 

The Fundamental Theorem of Approximation 


Let A be a nonempty, closed (complete), convex set in a Hilbert space H. 
For any x € H there is a unique point in A that is closest to x, i.e., x has a 
unique “best approximation” in A. 

si 


The best 
approximatio 


nto zin 
convex set A. 


Note that in non-Hilbert spaces, this may not be true! The proof is rather 
technical. See Young Chapter 3 or Moon and Stirling Chapter 2. Also 
known as the “closest point property”, this is very useful in compression 
and denoising. 


Linear Combinations of Vectors 


Suppose we have a set of vectors V1, V2,..., Uy that lie in a vector space V 
. Given scalars @1, Q@2,...,Q@y, observe that the linear combination 
Equation: 


QV, + AQUQ+... FaNUN 


is also a vector in V. 


Definition 1 
Let M Cc V beaset of vectors in V. The span of M, written span(M), is 
the set of all linear combinations of the vectors in M. 


Example: 
yi 
Equation: 
1 0 
Uzt= 1 ; V2 = 1 
0 0 


span({v1, v2}) = the 7, 29-plane, i.e., for any 71, 22 we can write 
“1 = a, and 2 = a) + a2 for some ay, Q2 € R. 


Lo 


- 


Illustration of the set of all 
linear combinations of v1 
and vg, i.e., the 21 22-plane. 


Example: 
V =({f: f(t) is periodic with period 27}, M = {eikty? . 


span(M) = periodic, bandlimited (to B) functions, i.e., f(t) such that 
Ae — ee cxe)* for some c_p,C_B41,---,€0,C1,---,CB EC. 


Vector Subspaces 


Defintition 1 
A (non-empty) subset W of V is called a supspace of V if for any x,y € W, 
span({z,y}) CW. 


Note that this definition easily implies that: 


°0ECW 
e W is itself a vector space 


Example: 
Which of these are subspaces? 


[Yes] 
¢ V=R’, W = {x: x4 = 0,25 = 0} [Yes] 
VSR Wa er YN 
¢ V =C(0, 1], W ={polynomials of degree N} [Yes] 


« V=C(0,1],W = {f : f is bandlimited to B} [Yes] 
© V=R, 

W ={«:a hasnomorethan 5 nonzerocomponents, i.e. , ||x||) <5} 
[No] 


Signal Approximation in a Hilbert Space 


We will now revisit “The Fundamental Theorem of Approximation” for the 
extremely important case where our set A is a subspace. Specifically, 
suppose that H is a Hilbert space, and let A be a (closed) subspace of H. 
From before, we have that for any x € HA there is a unique  € A such that 
zx is the closest point in A to x. When A is also a subspace, we also have: 
The Orthogonality Principle 


x € Ais the minimizer of ||x — || if any only if f — rLAi.e., 
(B - x,y) = 0 forall y € A. 


a. Suppose that # — x_LA. Then ao any y € Awithy 4 &, 
ly — x||" = ly-@+2— all’. Note that y — & € A, but — x LA, 


so that (y —2,x¢—- - = Q, and we can apply Pythagoras to obtain 
ly 
ly 


|? = ly — 2\ + ||@ — a||. Since y ¥ @, we thus have that 


> || — a||". Thus & must be the closest point in A to x. 


Illustration of 
the 
orthogonality 
principle. 


b. Suppose that 2 minimizes ||2 — ||. Suppose for the sake of a 
contradiction that Sy € A such that ||y|| = 1 and (x — @,y) =6 #0. 


Let z = & + oy. 
Equation: 


jz — z|? = lla — B — dyll” 
= (x —@,2 —®) — (x — @, by) — (oy, a — B) + (Sy, dy) 
= |x — @||” — 65 — 65 + 66 


= |le — a|° — |6). 


Thus ||x — z|| < |la —& 
la — al]. 


, contradicting the assumption that & minimizes 


This result suggests a that a possible method for finding the best 
approximation to a signal x from a vector space V is to simply look for a 
vector £ such that x — xLV. In the coming lectures we will show how to 
do this, but it will require a brief review of some concepts from linear 
algebra. 


Linear Operators 


Definition 1 
A transformation (mapping) L : X — Y from a vector space X to a vector 
space Y (with the same scalar field K_) is a linear transformation if: 


1. L(ar) =aLl(x)\V24 € X,a€EK 
2 da (x1 + £2) =f (x1) +L (a2)Vx1, £2 Ee X. 


We call such transformations linear operators. 


Example: 


e X=R* Y=R”L:RX 5 R” isanM x N matrix 
¢ Fourier transform: F(x (t)) = f x (t)e dt 


Fe: Lo (R) ms Lo (R) 


Let L : X — Y be an operator (linear or otherwise). The range space&(L) 
is 
Equation: 


&(L)={L(x@)€ Y: xe X}. 
The null space. VW (L), also known as “kernel”, is 
Equation: 


N(L) = {a © X: L(x) = 0}. 


If L is linear, then both A(L) and VW (L) are subspaces. 


Projections 


Definition 1 
A linear transformation P : X — X is called a projection if P(x) = x 
Va € &(P), ie, P(P(x)) = P(x)Va € X. 


Example: 
P:R oa R3, P (x1, L2, x3) = (x1, X22, 0) 


vs 


Definition 2 

If P is a projection operator on an inner product space V, we say that P is 
an orthogonal projection if @(P) | W(P) ,ie., (2, y) = 0 

Ve € &(P), ye V(P). 


If P is an orthogonal projection, then for any x € V we can write: 
Equation: 


C= Pea ye 
where Px € &(P) and (I — P)x € W(P) (since 
P(I — P)x = Px — P(Px) = Px — Pr = 0.) 


Now we see that the solution to our “best approximation in a linear 
subspace” problem is an orthogonal projection: we wish to find a P such 


A 


The question is now, how can we design such a projection operator? 


Linear Independence 


Definition 1 
A set of vectors {v es is said to be linearly dependent is there exists a set 


of scalars @1,...,Qy (not all 0) such that 
Equation: 
N 
yi ajvj =0 
j=l 


Likewise if ae ajv; = 0 only when a; = OV), then pte is said to 
be linearly independent. 


Example: 
17 ss Re 
Equation: 


Find a1, @2, a3 such that a1v1 + aev2 + a3v3 = 0. [ 
a, = 1,a2 = —3,a3 = 1.] Note that any two vectors are linearly 
independent. 


Note that if a set of vectors {v ae are linearly dependent then we can 
remove vectors from the set without changing the span of the set. 


Bases 


Definition 1 
A basis of a vector space V is a set of vectors B such that 


e span(B) = V. 
e Bis linearly independent. 


The second condition ensures that all bases of V will have the same size. In 


fact, the dimension of a vector space V is defined as the number of 
elements required in a basis for V. (Could easily be in infinite.) 


Example: 


e IR” with B the “standard basis” for RY 


Equation: 
1 0 
0 1 0 
{by, bo, ..., bn } = i) ae ee) 
0 0 1 


Note that this easily extends to £, (Z). 

e IR with any set of N linearly independent vectors 

¢ V = {polynomialsofdegreeatmost p}B = {1,t, eae ae 
(Note that the dimension of V is p + 1) 

e V={f(t): f(t) isperiodicwithperiod T}B = fees 
(Fourier series, infinite dimensional) 


C69) 


Orthogonal Bases 


Definition 1 
A collection of vectors B in an inner product space V is called an orthogonal 
basis if 


1. span(B) = V 
2 Uy wees 02,0) SONY CG 


If, in addition, the vectors are normalized under the induced norm, i.e., 

|| vs [|= 1 V 7, then we call V an orthonormal basis (or “orthobasis”). If V is 
infinite dimensional, we need to be a bit more careful with 1. Specifically, we 
really only need the closure of span(B) to equal V. In this case any x € V can 
be written as 

Equation: 


co 
t= ) CiVj 
i=1 


oe CO 
for some sequence of coefficients {c;};-,. 


(This last point is a technical one since the span is typically defined as the set of 
linear combinations of a finite number of vectors. See Young Ch 3 and 4 for the 
details. This won't affect too much so we will gloss over the details.) 


Example: 


e V = R’, standard basis 
Equation: 


Example: 


e Suppose 
i { piecewise constant functions on [0, +), |<; +); Ee 3), [2, 1|} 


. An example of such a function is illustrated below. 
f(t) 


Consider the set 
v1 (t) 


bm |e 
bol 
leo 
_ 
~ 


The vectors {v1, v2, v3, v4} form an orthobasis for V. 
[o.@) 
-_ -_ == 1 gkt . : : 
Suppose V = Lz» |[—7, 7]. B {+- e ‘a i.e, the Fourier series 
basis vectors, form an orthobasis for V. To verify the orthogonality of the 
vectors, note that: 


Equation: 


(oi = =) = = a ed (ki—ka)t 
A) 2 PAG Te Jae 


1 ed (ki—ka)t a 


~ Qn 5(k1 — ke) |__ 
1 —1+1 
—=—.__""_=0 (k, Zk 
Dee j(ki — ke) Une 2) 


See Young for proof that the closure of B is Lz |—7, 7, i-e., the fact that 
anyf € Lz |—7, 7] has a Fourier series representation. 


Computing the Best Approximation 


Recall that if P is an orthogonal projection onto a subspace A, we can write 
any x as 
Equation: 


= Patt P)x 


where Px € A and (I — P)a L A. We now turn to how to actually find P. 


We begin with the finite-dimensional case, assuming that {v),..., uw} isa 
basis for A. If (J — P)a | A then we have that for any x 
Equation: 


(i= P)e,0;)=0 for 9 =1Lyacs, N 


We also note that since Px € A, we can write Px = ee CrvU,. Thus we 
obtain 
Equation: 


N 
(= Yann] =0 for j= yes iV 


k=1 


from which we obtain 


Equation: 
N 
(x, V5) =) co ups). “for 7 Teal 
k=1 
We know « and v1,..., vy. Our goal is to find c},..., cy. Note that a 
procedure for calculating c,,...,c for any given x is equivalent to one that 


computes Px. 


To find c1,...,¢n, observe that [link] represents a set of NV equations with 
N unknowns. 


Equation: 
(U1,U1) (V2,¥1) +++) (Uy, ¥1) | Fe, (x, U1) 
(1, U2) (V2, V2) (UN, V2) | | e9 7 (x, U2) 
(¥1,Un) (v2,Un) +++ (UN, Un) | Lew (x, UN) 


More compactly, we want to find a vector c € CX such that Gc = b where 
Equation: 


Note: 


¢ Gis called the “Grammian” or “Gram matrix” of {v;} 

e One can show since v1,..., Uy are linearly independent that G is 
positive definite, and hence invertible. 

e Also note that by construction, G is conjugate symmetric, or 


“Hermitian”, i.e., G = G4, where 7 denotes the conjugate transpose 
of G. 


Thus, since G~! exists, we can write c = G~ ‘bd to calculate c. 


As a special case, suppose now that {v;} is an orthobasis for A? What is G? 
It is just the identity matrix J! Computing c just got much easier, since now 
c = b. Plugging this c back into out formula for Px we obtain 

Equation: 


Just to verify, note that P is indeed a projection matrix: 
Equation: 


N 


P( Pz): = (Soe U5) 0% o.)e 


Example Suppose f € Lz ((0, 4]) is given by 


Example: 
Suppose f € LD ((0, 4]) is given by 
Equation: 


Let 
A= { piecewise constant functions on [0 : ), E : i E 2 i (+, 1] } 


4 ee ary 
. Our goal is to find the closest (in Zz) function in A to f(t). Using 
v1,...,Ua4 from before, we can calculate c; = =  =063— = 


C4 = — Thus, we have that 
Equation: 


Matrix Representation of the Approximation Problem 


Suppose our inner product space V = R™ or C™ with the standard inner 
product (which induces the £2-norm). 


Re-examining what we have just derived, we can write our approximation 
x = Px = Vc, where V isan M x N matrix given by 
Equation: 


V= V1 Vo Caer’ UN 


and cis an N x 1 vector given by 
Equation: 


C1 
C2 


CN 


Given « € R™ (or C™), our search for the closest approximation can be 
written as 
Equation: 


min || — Vell 


or as 
Equation: 


min |le||5. subjectto «—Vc+e 
c,e 


Using V, we can replace G = V”V and b = V“z. Thus, our solution can 
be written as 
Equation: 


c= Viv ty#a, 


which yields the formula 
Equation: 


e-V Viv vig. 


The matrix Vt = V#V —'V# is knownas the “pseudo-inverse.” Why 


the name “pseudo-inverse”? Observe that 
Equation: 


Vive viv Cvayear. 


Note that 2 = VV 'z. We can verify that VV ' is a projection matrix since 
Equation: 
vwivvt =v viv Cviy vay Cy! 
-vVviy ty 
—vvi 


Thus, given a set of NV linearly independent vectors in R™ or C” (N < M 
), we can use the pseudo-inverse to project any vector onto the subspace 
defined by those vectors. This can be useful any time we have a problem of 
the form: 

Equation: 


xz—Vc+e 


where x denotes a set of known “observations”, V is a set of known 
“expansion vectors”, c are the unknown coefficients, and e represents an 
unknown “noise” vector. In this case, the least-squares estimate is given by 
Equation: 


C= Via, 4 VWV'e. 


Orthobasis Expansions 


Suppose that the {v Ava are a finite-dimensional orthobasis. In this case 


we have 
Equation: 


But what if ¢ € span({v;}) = V already? Then we simply have 
Equation: 


a S (x, 05); 


j=l 


for all x € V. This is often called the “reproducing formula”. In infinite 
dimensions, if V has an orthobasis {v jhjet and z € V has 


Equation: 


then we can write 
Equation: 


Li Seo 


j=l 


In other words, x is perfectly captured by the list of numbers 
(2, D4) 2009 Voi 


Sound familiar? 


Example: 


e V=C”, {vx} is the standard basis. 
Equation: 


Ln ey On 


e V = L,[-7, aI, vg(t) = = eJkt For any f € V we have 


Tv 


Equation: 
CO 
ft) = DS) cave 
k=—0o 
where 
Equation: 


C= Ge Uk) = ae i f(tje dt. 
a Ja 


The general lesson is that we can recreate a vector x in an inner product 
space from the coefficients { (a, vz) }. We can think of {(x, vz)} as 
“transform coefficients.” 


Parseval's and Plancherel's Theorems 


When dealing with transform coefficients, we will see that our notions of 
distance and angle carry over to the coefficient space. 


Let and suppose that is an orthobasis. (| denotes the 

index set, which could be finite or infinite.) Then and 
, and 

Equation: 

So 

Equation: 


This is Plancherel's theorem. Parseval's theorem follows since 
which implies that . Thus, an orthobasis 
makes every inner product space equivalent to! 


Error of the Best Approximation in an Orthobasis 


As an application of Parseval's Theorem, say {v;,}7°, is an orthobasis for 
an inner product space of V. 


Let A be the subspace spanned by the first 10 elements of {v;}, i-e., 
A = span({v1,...,v10}) 


1. Given x € v, what is the closest point in A (call it 2) to 7? We have 
seen that it is @ = 37," (a, ve) up 


2. How good of an approximation is & to x? Measured with || - ||: 
Equation: 
a) 2 
| e—@ |ly =|] (x, vx)ee| 
k>10 V 
2 
= — |(#, ve) | 
k>10 


Since we also have that || x I, = S~**, |(x, vp)|?, the approximation @ 
will be “good” if the first 10 transform coefficients contain “most” of the 
total energy. Constructing these types of approximations is exactly what is 
done in image compression. 


Approximation in €_p Norms 


So far, our approximation problem has been posed in an inner product 
Space, and we have thus measured our approximation error using norms that 
are induced by an inner product such as the L2/2 norms (or weighted 
[2/2 norms). Sometimes this is a natural choice — it can be interpreted as 
the “energy” in the error and arises often in the case of signals corrupted by 
Gaussian noise. However, more often than not, it is used simply because it 
is easy to deal with. 


In some cases we might be interested in approximating with respect to other 
norms — in particular we will consider approximation with respect to £,- 
norms for p ~ 0. First, we introduce the concept of a “unit ball”. Any norm 
gives us rise to a unit ball, i.e., {x : |||] = 1}. Some important examples 
of unit balls for the £, norms in R? are depicted below. 


We now consider an example of approximating a point in R? with a point in 
a 1-D subspace while measuring error using the £, norm for p = 1, 2, oo. 


Example: 
Suppose V = R?, 
Equation: 


We will want to find # € A that minimizes ||xz — 2|,. Since % € A, we 


can write 
Equation: 


and thus 
Equation: 


While we can solve for a € R to minimize |le||,, directly in some cases, a 


geometric interpretation is also useful. In each case, on can imagine 
growing an £, ball centered on z until the ball intersects with A. This will 
be the point & € A. that is closest to z in the £, norm. We first illustrate 
this for the 2) norm below: 


In order to calculate & we can apply the orthogonality principle. Since 


(e, [2 1") = 0 we obtain a solution defined by a = ¢. 
We now observe that in the case of the £,, norm the picture changes 
somewhat. The closest point in £,, is illustrated below: 


Note that the error is no longer orthogonal to the subspace A. In this case 
we can still calculate & from the observation that the two terms in the error 
should be equal, which yields a = =. 

The situation is even more different for the case of the 2; norm, which is 
illustrated below: 


We now observe that £ corresponds to a = 1. Note that in this case the 
error term is [0 Die This punctuates a general trend: for large values of p, 
the £, norm tends to spread error evenly across all terms, while for small 
values of p the error is more highly concentrated. 


When is it useful to approximate in £, or L, norms for p 0? 


Example: 


e Filter DesignIn some cases we will want the best fit to a specified 
frequency response in an L, sense rather than the L» sense. This 
minimizes the maximum error rather than total energy in the error. In 
the figure below we illustrate a desired frequency response. If the LD. 
norm of the error is small, then we are guaranteed that the 
approximation to our desired frequency response will lie within the 


illustrated bounds. 
|H(w)| 


¢ Geometry representationIn compressing 3D geometry, can be useful 
to bound the L error to ensure that basic shapes of narrow features 
(like poles, power lines, etc.) are preserved. 

e Sparsityln the case where the error is known to be sparse (i.e., zero 
on most indices) it can be useful to measure the error in the @; norm. 


Linear Systems 


In this course we will focus much of our attention on linear systems. When 
our input and output signals are vectors, then the system is a linear 
operator. 


Suppose that 2 X  Y isa linear operator from a vector space X to a 
vector space Y. If X and Y are normed vector spaces, then we can also 
define a norm on L. Specifically, we can let 


Equation: 
Ly 
L 4 
L 
gX fy ne 
An operator forwhich L yy is called a bounded operator. 
Example: 


BIBO (bounded-input, bounded-output) stable systems are systems for 
which 


Equation: 
x A La B 
Such a system satisfies L 4. 
One can show that yy Satisfies the requirements of a valid norm. In 
fact XxX Y xX Y isitselfa 


normed vector space! If Y isa Banach space, thensois X Y ! 


Bounded linear operators are common in DSP—they are “safe” in that 
“normal” inputs are guaranteed to not make your system explode. 


Are there any common systems that are unbounded? Not in finite 
dimensions, but in infinite dimensions there are plenty of examples! 


Example: 
Consider L 7a 7.Foranyk, fy t ——e Jkt is an element of 
i oe om with, ft . Consider the system D £, and note that 
Equation: 
d ik 
vreL iv ee ght Df; t k 


T 


Since f, t L «aa forall k, wecanset k to be as large as we want, 
so D cannot be bounded. 


A very important class of linear operators are those for which X Y.In 
this case we have the following important definition. 


Definition 1 

Suppose that X X isa linear operator. An eigenvector is a vector x 
for which La axforsomea K (ie.a or @ ). In this case, a 
is called the corresponding eigenvalue. 


Eigenvalues and eigenvectors tell you a lot about a system (more on this 
later!). While they can sometimes be tricky to calculate (unless you know 
the eig command in Matlab), we will see that as engineers we can usually 
get away with the time-honored method of “guess and check”. 


Discrete-Time Systems 


We begin with the simplest of discrete-time systems, where X = C% and 
Y = C™_ In this case a linear operator is just an MM x N matrix. We can 
generalize this concept by letting 17 and N go to oo, in which case we can 
think of a linear operator L : £2 (Z) — &2 (Z) as an infinite matrix. 


Example: 
Consider the shift operator A; : £2 (Z) — £2 (Z) that takes a sequence and 


shifts it by k. As an example, 4; can be viewed as the infinite matrix given 
by 


Equation: 
0 
Y-1 0 1 O L_1 
Yo 0 dhe 6) 4 0) 
0 1 O L1 


Y1 


0 


Note that || Ax/||,, = 1 (for any k and p) since the delay doesn't change the 


norm of x. The delay operator is also an example of a linear shift-invariant 
(LSI) system. 


Definition 1 
An operator L : £2 (Z) — £9 (Z) is called shift-invariant if 
L(A; (x)) = Ax (L£ (x)) for all x € 2 (Z) and for any k € Z. 


Observe that A;, (Ax, (x)) = Az, +z, (x) so that A; itself is an LSI operator. 


Lets take a closer look at the structure of an LSI system by viewing it as an 
infinite matrix. In this case we write y = Hz to denote 
Equation: 


Yo = be? Te he a) 


Suppose we want to figure out the column of H corresponding to h°. What 
input x could help us determine h°? Consider the vector 
Equation: 


i.e., 2 = 6|n]. For this input y= Hz = h®. What about h'? 

A, (x) = 5[n — 1] would yield h!. In general A; (x) = 5[n — k] tell us the 
column h*. But, if H is LSI, then 

Equation: 


h* = H(Ax (5[nJ)) 
= Ax (H (4 [nJ)) 


= Ay (h") 


This means that each column is just a shifted version of h°, which is usually 
called the impulse response. 


Now just to keep notation clean, let h = h° denote the impulse response. Can 
we get a simple formula for the output y in terms of h and x? Observe that we 
can write 

Equation: 


Yo = hy ho h-y LO 


Each column is just shifted down one. (Each successive row is also shifted 
right one.) Looking at y_1, yo and yi, we can rewrite this formula as 
Equation: 


yl-]] hi0| h{-1| h|-2| 
yO] =---+a[-1] Ail] +2[0] hf0] +2[1] A[-1] + 
yll] hl2| All| ho) 


From this we can observe the general pattern 
Equation: 


y[n] = +++ +2[-lh[n +1] + 2[OJA[n + 0) + 2[1Ja[n - 1] +--- 


or more concisely 
Equation: 


Does this look familiar? It is simply the formula for the discrete-time 
convolution of x and h, i.e., 
Equation: 


y= a*h. 


Eigenvectors of LSI Systems 


Suppose that h is the impulse response of an LSI system. Consider an input 
x |n| = 2” where z is a complex number. What is the output of the system? 
Recall that z*h = h*z. In this case, it is easier to use the formula: 
Equation: 


yin] = {Ke [nm — A 
k=—0o 
i h [k|z”* 
k=—0o 
=e ° h [kz * 
k=—0o 
= z|n|H(z) 
where 
Equation: 
Ay) 7 h[kjz* 
k=—0o 


In the event that H(z) converges, we see that y|n] is just a re-scaled version 
of x[n]. Thus, x[n] is an eigenvector of the system H, right? Not exactly, 
but almost... technically, since z” ¢ £2 (_ ) it isn't really an eigenvector. 
However, most DSP texts ignore this subtlety. The intuition provided by 
thinking of z” as an eigenvector is worth the slight abuse of terminology. 


Next time we will analyze the function H(z) in greater detail. H(z) is 
called the z-transform of h, and provides an extremely useful 
characterization of a discrete-time system. 


The z-Transform 


The z-transform 


We introduced the z-transform before as 
Equation: 


HiGgie > h [kjz* 


k=—00 


where z is a complex number. When H(z) exists (the sum converges), it 
can be interpreted as the “response” of an LSI system with impulse 
response h[n] to the input of z”. The z-transform is useful mostly due to its 
ability to simplify system analysis via the following result. 


Theorem 
If y = h*a, then Y(z) = H(z)X(z). 


Proof 
First observe that 
Equation: 
> y[njz" = Ss >= g|k|h[n—k] 2” 
n=—oco n=—CoO k=—0o 
=> clk} So Aln-hjz” 
k=—0o n=—0o 


Let m = n — k, and note that z~” = z~™ - z—*. Thus we have 
Equation: 


This yields the “transfer function” 
Equation: 


The Discrete-Time Fourier Transform 


The discrete-time Fourier transform 


The (non-normalized) DTFT is simply a special case of the z-transform for 
the case |z| = 1, ie., z = e for some value w € [—7, 7] 
Equation: 


The picture you should have in mind is the complex plane. The z-transform 
is defined on the whole plane, and the DTFT is simply the value of the z- 
transform on the unit circle, as illustrated below. 


This picture should make it clear why the DTFT is defined only for 

w € |—7, 7] (or why it is periodic). Using the normalization above, we also 
have the inverse DTFT formula: 

Equation: 


z-Transform Examples 


-transform examples 


Example: 
Consider the -transform given by , as illustrated below. 
|2| 


Iin{z] 


Re[z] 


The corresponding DTFT has magnitude and phase given below. 


|H(e™)| 


What could the system be doing? It is a perfect all-pass, linear-phase 
system. But what does this mean? 


Suppose . Then 

Equation: 

Thus, is the -transform of a system that simply delays the 
input by. isthe -transform of a unit-delay. 

Example: 


Now consider 


Equation: 


What if — Then — does not converge! Therefore, 
whenever we compute a_ -transform, we must also specify the set of 's for 
which the -transform exists. This is called the region of convergence 
(ROC). In the above example, the ROC= 


Im{z] 


Example: 
What about the “evil twin” 


Equation: 


We get the exact same result but with ROC= 


z-Transform Analysis of Discrete-Time Filters 


-transform analysis of discrete-time filters 


The -transform might seem slightly ugly. We have to worry about the 
region of convergence, and we haven't even talked about how to invert it 
yet (it isn't pretty). However, in the end it is worth it because it is extremely 
useful in analyzing digital filters with feedback. For example, consider the 
system illustrated below 


y(n] 


We can analyze this system via the equations 
Equation: 


and 
Equation: 


More generally, 
Equation: 


and 
Equation: 


or equivalently 
Equation: 


In general, many LSI systems satisfy linear difference equations of the 
form: 
Equation: 


What does the -transform of this relationship look like? 
Equation: 


Note that 
Equation: 


Thus the relationship above reduces to 
Equation: 


Hence, given a system like the one above, we can pretty much immediately 
write down the system's transfer function, and we end up with a rational 
function, i.e., a ratio of two polynomials in . Similarly, given a rational 
function, it is easy to realize this function in a simple hardware architecture. 
We will focus exclusively on such rational functions in this course. 


Poles and Zeros 


Poles and zeros 


Suppose that X(z) is a rational function, i.e., 
Equation: 


where P(z) and Q(z) are both polynomials in z. The roots of P(z) and 
Q(z) are very important. 


Zero 
A zero of X(z) is a value of z for which X(z) = 0 (or P(z) = 0).A 
pole of X(z) is a value of z for which X(z) = oo (or Q(z) = 0). 


For finite values of z, poles are the roots of Q(z), but poles can also occur 
at 2 = oo. We denote poles in a z-plane plot by “x” we denote zeros by “o 
”. Note that the ROC clearly cannot contain any poles since by definition 
the ROC only contains z for which the z-transform converges, and it does 
not converge at poles. 


Example: 
Consider 
Equation: 
x1|n| = a®u[n] + Xi(z)=-——, |a| > lal 
Z— a 
and 
Equation: 


x(n] = —a”u|—-1— nl es Xz — |z| < |a| 


Note that the poles and zeros of X (z) and X2 (z) are identical, but with 
opposite ROCs. Note also that neither ROC contains the point a. 


Example: 
Consider 
Equation: 


We can compute the z-transform of x3 |n] by simply adding the z- 
transforms of the two different terms in the sum, which are given by 
Equation: 


1s BWW 1 
(5) u|n| o ROC: |z| > 5 


and 
Equation: 


ib \e 1 
(-5] u [n] as OC Aaa =. 
am io a 


The poles and zeros for these z-transforms are illustrated below. 
Im/[z| 


X3 (2) is given by 
Equation: 


Note that the poles do not change, but the zeros do, as illustrated above. 


Example: 
Now consider the finite-length sequence 
Equation: 
in] a” 0<n<N-I1 
ga 0 otherwise. 


r4(n| 


The z-transform for this sequence is 


Equation: 
N=1 N=1 
XC S- Linea — arz 
n=0 n=0 
N 
mise 
a — @ 
z 
N N 
eZ = (8) 
= COC ee 
ee 7 
We can immediately see that the zeros of X4(z) occur when z¥ = aN, 


Recalling the “N™ roots of unity”, we see that the zeros are given by 
Equation: 


= aein®, k= ON Sl 


At first glance, it might appear that there are NV — 1 poles at zero and 1 
pole at a, but the pole at a is cancelled by the zero (zp) at a. Thus, X4 (z) 
actually has only NV — 1 poles at zero and N — 1 zeros around a circle of 
radius a as illustrated below. 


So, provided that |a| < oo, the ROC is the entire z-plane except for the 
origin. This actually holds for all finite-length sequences. 


Stability, Causality, and the z-Transform 


Stability, causality, and the z-transform 


In going from 


Equation: 
N m 
So axy[n —k| = So byw [n — k] 
k=0 k=0 
to 
Equation: 
i” 
X(z) 


we did not specify an ROC. If we factor H(z), we can plot the poles and 
zeros in the z-plane as below. 


Several ROCs may be possible. Each ROC corresponds to a different 
impulse response, so which one should we choose? In general, there is no 
“right” choice, however, there are some choices that make sense in practice. 


In particular, if h[n] is causal, i.e., if h[n] = 0, n < 0, then the ROC 
extends outward from the outermost pole. This can be seen in the examples 
up to this point. Moreover, recall that a system is BIBO stable if the impulse 
response h € £; (_ ). In this case, 

Equation: 


| 


se h|[njz” 


n=— OO 


|H(z)| 


IA 
te 
= 
= 
e 
3 


Consider the unit circle z = e””. In this case we have z~” |=|e 2" |= 1, 
so that 
Equation: 


He® < S- |h[n]| < co 


n=— OO 


for all w. Thus, if a system is BIBO stable, the ROC of H(z) must include 
the unit circle. In general, any ROC containing the unit circle will be BIBO 
stable. 


This leads to a key question — are stability and causality always compatible? 
The answer is no. For example, consider 
Equation: 


—______ + 
(z — 2) z+4 (Same z+4 


and its various ROC's and corresponding inverses. If the ROC contains the 
unit-circle (so that the corresponding system is stable) and is not to contain 
any poles, then it must extend inward towards the origin, and hence it 


cannot be causal. Alternatively, if the ROC is to extend outward, it will not 
contain the unit-circle so that the corresponding system will not be BIBO 
stable. 


Inverse Systems 


Inverse systems 
Many signal processing problems can be interpreted as trying to undo the 


action of some system. For example, echo cancellation, channel obvolution, 
etc. The problem is illustrated below. 


on] | yn] —] tr Lan 


[missing_resource: 15_2.eps] 


If our goal is to design a system H; that reverses the action of H, then we 
clearly need H (z)H;(z) = 1. In the case where 


Equation: 
P 
H(z) = 4) 
Q(z) 
then this can be achieved via 
Equation: 
Q(z) 
H — 


Thus, the zeros of H(z) become poles of H; (z), and the poles of H(z) 
become zeros of H; (z). Recall that H(z) being stable and causal implies 
that all poles are inside the unit circle. If we want H(z) to have a stable, 
causal inverse H; (z), then we must have all zeros inside the unit circle, 
(since they become the poles of H; (z).) Combining these, H(z) is stable 
and causal with a stable and causal inverse if and only if all poles and zeros 


of H(z) are inside the unit circle. This type of system is called a minimum 
phase system. 


Inverse z-Transform 


Inverse z-transform 


Up to this point, we have ignored how to actually invert a z-transform to 
find x{n| from X(z). Doing so is very different from inverting a DIFT. We 
will consider three main techniques: 


1. Inspection (look it up in a table) 
2. Partial fraction expansion 
3. Power series expansion 


One can also use contour integration combined with the Cauchy Residue 
Theorem. See Oppenheim and Schafer for details. 


Inspection 


Basically, become familiar with the z-transform pairs listed in tables, and 
“reverse engineer” 


Example: 
Suppose that 
Equation: 


z 
K()=—, [> Ial, 


By now you should be able to recognize that x [n] = a™u [n]. 


Partial fraction expansion 


If X(z) is rational, break it up into a sum of elementary forms, each of 
which can be inverted by inspection. 


Example: 
Suppose that 
Equation: 


1 97-1 —2 
ae Se eh a 


eee 
— 3271 -L Lo 


By computing a partial fraction expansion we can decompose X (z) into 
Equation: 


8 9 
eS 


X (z) = eat 1. Si 
2 


where each term in the sum can be inverted by inspection. 


Power Series Expansion 


Recall that 
Equation: 


A(z) = 7 x |njz” 


at [—2]2? + a [-1jz+ [0] +2 [Iz +2 [22° 


Shays 


If we know the coefficients for the Laurent series expansion of X(z), then 
these coefficients give us the inverse z-transform. 


Example: 


Suppose 
Equation: 
il 
X(z) =27 1- a eg eae 
dh 
2 =) 
SS ey 
Z 5 iz ae 9 ve 
Then 
Equation: 


(n] =6[n+2]- 5 d[n +1] —6[n] + 55 [n— 1] 


Example: 
Suppose 
Equation: 


X(z)=log 1+az*, |z|>|al 
where log denotes the complex logarithm. Recalling the Laurent series 


expansion 
Equation: 


we Can write 
Equation: 


Thus we can infer that 
Equation: 


Fourier Representations 


Fourier Representations 


Throughout the course we have been alluding to various Fourier 
representations. We first recall the appropriate transforms: 


¢ Fourier Series (CTFS)x(t): continuous-time, finite/periodic on 


[hy | 
Equation: 
1 is 
= — | a (t)e "dt 
V2n Jn 
Equation: 


aR — 
=~ Y° X[Ke™ 
V 20 2) 


e Discrete-Time Fourier Transform (DTFT)z([n): infinite, discrete- 


ze (a) 


time 
Equation: 

xX (e”) = rule % ze [nje %" 

Jn n=—0o 
Equation: 
Al © 
x [n] — — | xX (e”) ef" du 
/20 


e Discrete Fourier Transform (DFT)z(n|: finite, discrete-time 
Equation: 


tt 528 
Xi] =—— eal nle 


Equation: 


1 No 
ain = X [kled wk 
[n] Tw oy [A 


e Continuous-Time Fourier Transform (CTFT)z(t): infinite, 
continuous-time 
Equation: 


en eee 
x (2) == | (t)e iat 


Equation: 


eee eit 
x(t) == f x(a) dn 


We will think of Fourier representations in two complimentary senses: 


1. “Eigenbasis” representations: Each Fourier transform pair is very 
naturally related to an appropriate class of LTI systems. In some cases 
we can think of a Fourier transform as a change of basis. 

2. Unitary operators: While we often use Fourier transforms to analyze 
certain operators, we can also think of a Fourier transform as itself 
being an operator. 


finite 


. . DFT / IDFT 
discrete-time 


DTFT /ICTFS 


infinite finite 


discrete-time continuous-time 


CTFS / IDTFT 


infinite 
continuous-time 


Normalized DTFT as an Operator 


Normalized DTFT as an operator 


Note that by taking the DTFT of a sequence we get a function defined on [—7, 7]. 
In vector space notation we can view the DTFT as an operator (transformation). 
In this context it is useful to consider the normalized DTFT 

Equation: 


One can show that the summation converges for any x € £2 (7), and yields a 
function X (e%) € Ly [—n, x]. Thus, 
Equation: 


F :t2(Z) > Le |-7, 7] 


can be viewed as a linear operator! 


Note: It is not at all obvious that FY can be defined for all x € £2 (Z). To show 
this, one can first argue that if « € 2, (Z), then 


Equation: 
x(e*)| < mm & eine 
< — 3 |x[n]| le ro | 
Qn n=—oco 
= aye |z[n]| < co 
Qn n=—0o 


For an x € £2 (Z)\£; (Z), one must show that it is always possible to construct a 
sequence xy € 2 (Z) M £1 (Z) such that 
Equation: 


jim || z,— 2 ||, =0. 


This means {a;} is a Cauchy sequence, so that since £2 (Z) is a Hilbert space, the 
limit exists (and is z). In this case 
Equation: 


xX (ec!) = lim Xk (ec). 


So for any x € £5 (Z), we can define F (x) = X (e™), where 
X (e™) € Ly [—n, 7]. 


Can we always get the original x back? Yes, the DTFT is invertible 
Equation: 


1 


= 


i xX (e”) - "day 


To verify that YF ~' (FY (x)) = 2, observe that 
Equation: 


| ee 
—jwk jun aaa e Jw(k—n) 
Van =| = ee x |kle eo" dw i 28 qf dw 
zk 


] - 276 [n — k] 


= 7 


One can also show that for any X € Ly {—7,7], F(F 1(X)) = 


Operators that satisfy this property are called unitary operators or unitary 
transformations. Unitary operators are nice! In fact, if A = X — Y is a unitary 
operator between two Hilbert spaces, then one can show that 

Equation: 


(21,2) — (Ax , Axe) Vv L097 S X, 


i.e., unitary operators obey Plancherel's and Parseval's theorems! 


Fourier Transforms as Unitary Operators 


Fourier transforms as unitary operators 


We have just seen that the DTFT can be viewed as a unitary operator 
between £2 (Z) and Lz [—7, z]. One can repeat this process for each Fourier 
transform pair. In fact due to the symmetry between the DTFT and the 
CTFES, we have already established this for CTFS, i.e., 

Equation: 


CTFS: L2[—7, 7] > £2 (Z) 


is a unitary operator. Similarly, we have 
Equation: 


CTES: Ly (R) —> Ly» (R) 


is a unitary operator as well. The proof of this fact closely mirrors the proof 
for the DTFT. Finally, we also have 
Equation: 


DET? Cesc™, 


This operator is also unitary, which can be easily verified by showing that 
the DFT matrix is actually a unitary matrix: UU = UU# = I. 


Note that this discussion only applies to finite-energy (2/2) signals. 
Whenever we talk about infinite-energy functions (things like the unit step, 
delta functions, the all-constant signal) having a Fourier transform, we need 
to be very careful about whether we are talking about a truly convergent 
Fourier representation or whether we are merely using an engineering 
“trick” or convention. 


The DTFT as an “Eigenbasis” 


The DTFT as an “Ejigenbasis” 


We saw Parseval/Plancherel in the context of orthonormal basis expansions. 
This begs the question, do ¥ and FY just take signals and compute their 
representation in another basis? 


Let's look at F first: 
Equation: 

HF nr 
Recall that is really just a function of ,soifwereplace with , 
we get 
Equation: 

KF = 

Does this seem familiar? If is a periodic function defined on ’ 
then F is just computing (up to a reversal of the indicies) the 


continuous-time Fourier series of | 


We said before that the Fourier series is a representation in an orthobasis, 
the sequence of coefficients that we get are just the weights of the different 
basis elements. Thus we have PP and 

Equation: 


What about #? In this case we are taking an and mapping it to 


an . represents an infinite set of numbers, and when we 
weight the functions by and sum them all up, we get back the 
original signal 

Equation: 

Unfortunately, —— (__) so technically, we can't really think of 


this as a change of basis. 


However, as a unitary transformation, Y has everything we would ever 
want in a basis and more: We can represent any using 
, and since it is unitary, we have Parseval and Plancherel 


Theorems as well. On top of that, we already showed that the set of vectors 
are eigenvectors of LSI systems — if this really were a basis, 


it would be called an eigenbasis. 


Eigenbases are useful because once we represent a signal using an 
eigenbasis, to compute the output of a system we just need to know what it 
does to its eigenvectors (i.e., its eigenvalues). For an LSI system, 
represents a set of eigenvalues that provide a complete characterization of 
the system. 


Eigenbases and LSI Systems 


Why is an eigenbasis so useful? It allows us to greatly simplify the 
computation of the output for a given input. For example, suppose that X is 
a vector space and that L : X — X is a linear operator with eigenvectors 
{vk} per If {vr},<p form a basis for X, then for any « € X we can write 
z= Yopep Crp. In this case we have that 

Equation: 


y =LF 


=) (x ar] 
kel 
= De cr (vz) 


kel 


= So ceARUE 


kel 


In the case of a DT, LSI system A, we have that ee is an 
Tv 


eigenvector of H and for any x[n] we can write 
Equation: 


From the same line of reasoning as above, we have that 
Equation: 


J 20 


Whenever we have an eigenbasis, we can represent our operator as simply a 
diagonal operator when the input and output vectors are represented in the 
eigenbasis. The fact that convolution in time is equivalent to multiplication 
in the Fourier domain is just one instance of this phenomenon. Moreover, 
while we have been focusing primarily on the DTFT, it should now be clear 
that each Fourier representation forms an eigenbasis for a specific class of 
operators, each of which defines a particular kind of convolution. 


e DTFTdiscrete-time convolution (infinite) 
¢ CTFTcontinuous-time convolution (infinite) 
Equation: 


fat) =f t(g(t= ner 
e DFTdiscrete-time circular convolution 


Equation: 


N-1 


S> 2 [Alyn [n — k] 


k=0 


S 

= 

S 
! 


e CTFScontinuous-time circular convolution 
Equation: 


(f d= f° fer &—aar 


This is the main reason why we have to care about circular convolution. It is 
something that one would almost never want to do — but if you multiply two 
DFTs together you are doing it implicitly, so be careful and remember what 


it is doing. 


